Goto

Collaborating Authors

 relu function



Schauder Bases for $C[0, 1]$ Using ReLU, Softplus and Two Sigmoidal Functions

Ganesh, Anand, Bose, Babhrubahan, Rajagopalan, Anand

arXiv.org Artificial Intelligence

We construct four Schauder bases for the space $C[0,1]$, one using ReLU functions, another using Softplus functions, and two more using sigmoidal versions of the ReLU and Softplus functions. This establishes the existence of a basis using these functions for the first time, and improves on the universal approximation property associated with them. We also show an $O(\frac{1}{n})$ approximation bound based on our ReLU basis, and a negative result on constructing multivariate functions using finite combinations of ReLU functions.



Supplementary Material of Rational neural networks

Neural Information Processing Systems

Finally, we use the identity ReLU( x) = |x | + x 2, x R, to define a rational approximation to the ReLU function on the interval [ 1, 1] as r (x) = 1 2 null xr ( x) 1 + null + x null . Therefore, we have the following inequalities for x [ 1, 1], | ReLU( x) r (x) | = 1 2 null null null null | x| xr ( x) 1 + null null null null null 1 2(1 + null) (||x | xr (x) | + null| x |) null 1 + null null. We now show that ReLU neural networks can approximate rational functions. The structure of the proof closely follows [12, Lemma 1.3]. The statement of Theorem 3 comes in two parts, and we prove them separately.


Review for NeurIPS paper: Continual Learning in Low-rank Orthogonal Subspaces

Neural Information Processing Systems

Weaknesses: Despite having a novel core idea, I think this paper is not ready for publication and needs substantial improvement before publication: 1. Currently it seems that you need to know T because projection matrices P_t should be constructed before starting continual learning. This is a huge limitation because the very notion of "continual learning" implies that T is not known a priori because the learning agent supposedly is learning over unlimited time periods (i.e., we may even have T\rightarrow\infty) . Currently, learning task T 1 is going to invalidate your core idea because building an orthogonal P_{T 1} does not seem to be trivial. In my opinion, this constraint should be removed. But I think this is a highly slippery assumption.


A Hard-Label Cryptanalytic Extraction of Non-Fully Connected Deep Neural Networks using Side-Channel Attacks

Coqueret, Benoit, Carbone, Mathieu, Sentieys, Olivier, Zaid, Gabriel

arXiv.org Artificial Intelligence

During the past decade, Deep Neural Networks (DNNs) proved their value on a large variety of subjects. However despite their high value and public accessibility, the protection of the intellectual property of DNNs is still an issue and an emerging research field. Recent works have successfully extracted fully-connected DNNs using cryptanalytic methods in hard-label settings, proving that it was possible to copy a DNN with high fidelity, i.e., high similitude in the output predictions. However, the current cryptanalytic attacks cannot target complex, i.e., not fully connected, DNNs and are limited to special cases of neurons present in deep networks. In this work, we introduce a new end-to-end attack framework designed for model extraction of embedded DNNs with high fidelity. We describe a new black-box side-channel attack which splits the DNN in several linear parts for which we can perform cryptanalytic extraction and retrieve the weights in hard-label settings. With this method, we are able to adapt cryptanalytic extraction, for the first time, to non-fully connected DNNs, while maintaining a high fidelity. We validate our contributions by targeting several architectures implemented on a microcontroller unit, including a Multi-Layer Perceptron (MLP) of 1.7 million parameters and a shortened MobileNetv1. Our framework successfully extracts all of these DNNs with high fidelity (88.4% for the MobileNetv1 and 93.2% for the MLP). Furthermore, we use the stolen model to generate adversarial examples and achieve close to white-box performance on the victim's model (95.8% and 96.7% transfer rate).


Zorro: A Flexible and Differentiable Parametric Family of Activation Functions That Extends ReLU and GELU

Roodschild, Matias, Gotay-Sardiñas, Jorge, Jimenez, Victor A., Will, Adrian

arXiv.org Artificial Intelligence

Even in recent neural network architectures such as Transformers and Extended LSTM (xLSTM), and traditional ones like Convolutional Neural Networks, Activation Functions are an integral part of nearly all neural networks. They enable more effective training and capture nonlinear data patterns. More than 400 functions have been proposed over the last 30 years, including fixed or trainable parameters, but only a few are widely used. ReLU is one of the most frequently used, with GELU and Swish variants increasingly appearing. However, ReLU presents non-differentiable points and exploding gradient issues, while testing different parameters of GELU and Swish variants produces varying results, needing more parameters to adapt to datasets and architectures. This article introduces a novel set of activation functions called Zorro, a continuously differentiable and flexible family comprising five main functions fusing ReLU and Sigmoid. Zorro functions are smooth and adaptable, and serve as information gates, aligning with ReLU in the 0-1 range, offering an alternative to ReLU without the need for normalization, neuron death, or gradient explosions. Zorro also approximates functions like Swish, GELU, and DGELU, providing parameters to adjust to different datasets and architectures. We tested it on fully connected, convolutional, and transformer architectures to demonstrate its effectiveness.


Deep Learning Activation Functions: Fixed-Shape, Parametric, Adaptive, Stochastic, Miscellaneous, Non-Standard, Ensemble

Hammad, M. M.

arXiv.org Artificial Intelligence

In the architecture of deep learning models, inspired by biological neurons, activation functions (AFs) play a pivotal role. They significantly influence the performance of artificial neural networks. By modulating the non-linear properties essential for learning complex patterns, AFs are fundamental in both classification and regression tasks. This paper presents a comprehensive review of various types of AFs, including fixed-shape, parametric, adaptive, stochastic/probabilistic, non-standard, and ensemble/combining types. We begin with a systematic taxonomy and detailed classification frameworks that delineates the principal characteristics of AFs and organizes them based on their structural and functional distinctions. Our in-depth analysis covers primary groups such as sigmoid-based, ReLU-based, and ELU-based AFs, discussing their theoretical foundations, mathematical formulations, and specific benefits and limitations in different contexts. We also highlight key attributes of AFs such as output range, monotonicity, and smoothness. Furthermore, we explore miscellaneous AFs that do not conform to these categories but have shown unique advantages in specialized applications. Non-standard AFs are also explored, showcasing cutting-edge variations that challenge traditional paradigms and offer enhanced adaptability and model performance. We examine strategies for combining multiple AFs to leverage complementary properties. The paper concludes with a comparative evaluation of 12 state-of-the-art AFs, using rigorous statistical and experimental methodologies to assess their efficacy. This analysis not only aids practitioners in selecting and designing the most appropriate AFs for their specific deep learning tasks but also encourages continued innovation in AF development within the machine learning community.


Is ReLU Adversarially Robust?

Sooksatra, Korn, Hamerly, Greg, Rivas, Pablo

arXiv.org Artificial Intelligence

The efficacy of deep learning models has been called into question by the presence of adversarial examples. Addressing the vulnerability of deep learning models to adversarial examples is crucial for ensuring their continued development and deployment. In this work, we focus on the role of rectified linear unit (ReLU) activation functions in the generation of adversarial examples. ReLU functions are commonly used in deep learning models because they facilitate the training process. However, our empirical analysis demonstrates that ReLU functions are not robust against adversarial examples. We propose a modified version of the ReLU function, which improves robustness against adversarial examples. Our results are supported by an experiment, which confirms the effectiveness of our proposed modification. Additionally, we demonstrate that applying adversarial training to our customized model further enhances its robustness compared to a general model.


Efficient Quantum Circuits for Machine Learning Activation Functions including Constant T-depth ReLU

Zi, Wei, Wang, Siyi, Kim, Hyunji, Sun, Xiaoming, Chattopadhyay, Anupam, Rebentrost, Patrick

arXiv.org Artificial Intelligence

In recent years, Quantum Machine Learning (QML) has increasingly captured the interest of researchers. Among the components in this domain, activation functions hold a fundamental and indispensable role. Our research focuses on the development of activation functions quantum circuits for integration into fault-tolerant quantum computing architectures, with an emphasis on minimizing $T$-depth. Specifically, we present novel implementations of ReLU and leaky ReLU activation functions, achieving constant $T$-depths of 4 and 8, respectively. Leveraging quantum lookup tables, we extend our exploration to other activation functions such as the sigmoid. This approach enables us to customize precision and $T$-depth by adjusting the number of qubits, making our results more adaptable to various application scenarios. This study represents a significant advancement towards enhancing the practicality and application of quantum machine learning.